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ABSTRACT 



From a Principal Component Analysis (PCA) of 78 z ~ 3 high quality quasar spectra in the SDSS-DR7, we derive the principal 
components characterizing the QSO continuum over the full wavelength range available. The shape of the mean continuum, is similar 
to that measured at low-z (z ~ 1), but the equivalent width of the emission lines are larger at low redshift. We calculate the correlation 
between fluxes at different wavelengths and find that the emission line fluxes in the red part of the spectrum are correlated with that 
in the blue part. We construct a projection matrix to predict the continuum in the Lyman-a forest from the red part of the spectrum. 
We apply this matrix to quasars in the SDSS-DR7 to derive the evolution with redshift of the mean flux in the Lyman-o- forest due 
to the absorption by the intergalactic neutral hydrogen. A change in the evolution of the mean flux is apparent around z ~ 3 in 
the sense of a steeper decrease of the mean flux at higher redshifts. The same evolution is found when the continuum is estimated 
from the extrapolation of a power-law continuum fitted in the red part of the quasar spectrum if a correction, derived from simple 
simulations, is applied. Our findings are consistent with previous determinations using high spectral resolution data. We provide the 
PCA eigenvectors over the wavelength range 1020-2000 A and the distribution of their weights that can be used to simulate QSO 
mock spectra. 

Key words. Methods: numerical — galaxies: intergalactic medium, quasars 



1. Introduction 

At high redshift, most of t he baryons are locat ed in the inter- 
galactic medium (IGM; e.g. lPetitiean et alJll993D where they are 
highly i onized by the UV-backg round produced by ga laxies and 
QSOs (Gunn & Peterson '19651), at least since z - 6 ("F an et al.) 
,2006; Becker et al. 2007). The large absorption cross section of 
the H I Lyman-ff transition implies that the small fraction of neu- 
tral hydrogen in the IGM produces the so-called Lyman-or forest 
composed of numerous absor ption lines detected in the spectra 
of high redshift quasars (see iLvndsl [19711 iRauchll 19981 for a 
review). 



Analytic al models (iBi et al. _ 1992h and nume r ical A^-body 



1995 



simulations ([C en et al. 1994; Petitiean et al. 1995; Zha ng et al 



iHernquist et al. 1996; Theuns et al. 1998; Riediger et al 



19981) have been very successful at reproducing the properties 
of the Lyman-ff forest as measured from high spectral resolu- 
tion, high SNR data obtained with the Ultraviolet and Visual 
Echelle S pectrograph (UVES) on the Very L arge Telescope 
.1 



(VLT, e.g. iBerperon et 



( UVES) 
aDl200S 



Kim et al.ll2007!) and HIRES on 



the Keck telescope ( e.g. iHu et al ■I ll995h . The overwhole picture 



tells that lower column-density H i absorption lines trace the 
filaments of the 'cosmic web', and higher column-density ab- 
sorption lines trace the surroundings of galaxies. Detailed stud- 
ies of absorption Une properties and of their clustering prop- 
erties along one or several adjacent lines of sight give addi- 
tional constraints on the ionization history, correlation length, 
matter power sp e ctrum etc... (see e..?. P etitjean et al. 1998' 
Croft et all 119981: [McDonald et all 120051; iTheuns & Srianandl 
2006h . The next generation of quasar surveys, from BOSS 



(SDSS-III. 'Schlegel et al."2007l: lEisenstein et alJl201 ll) to Big- 
BOSS ISchlegel et al. 2009) should provide the first detection 
of Baryonic Acoustic O scillations in the IGM at z ~ 2-3 
(ISlosar et al.ll2009t IWhite et al. 2010) . 

An important quantity to measure in a quasar spectrum is 
the mean amount of absorption in the Lyman-o' forest. Da, de- 
fined as: Da = 1-<F> (Oke & Korvcansky 1982), where F is 
the quasar normalized flux, F - f obs/f^com, f^obs is the observed 
flux and Fcom is the estimated unabsorbed continuum flux. The 
absorption can be defined as well by the mean effective opti- 
cal depth, Tetf - - \vi<F>. These quantities are sensitive to 
the physic al properties of the IGM and hav e been used to con- 
strair i Qh (lRauch|ll998t iTvtler et al.ll2004 flie ionization his- 
tory ("Rauch et al."l997';'Kirkman et al."2005'; 'Bolton et al."2005t 
JBolton & Haehnelt 2007; Prochaskaet al. 2009) and in partic- 
ular t he He ii reionization (Bernardi et al.l 120031: ITheuns et aH 
I2OO2I) . The latter could possi bly induce a dip in the evolution 
wi fli redshift of Tef f atz ~ 3.2 jSchave et alj|2000t) . 

^ Bernardi et al.l (12003 ^ first discovered such a dip in the evo- 
lution of the effective optical depth in the Lyman-a forest us- 
ing SDSS spectra. The existence of a feature has then been 
confirme d from hig h-resolution studies (tFaucher-Giguere et al.l 
2008; Da ll'Aglio e t al. 2008), at a more modest statistical sig- 
nificance but at a coincident redshift. 

Intermediate resolution data have been used as well 
(iMcDonald et alJl200l IPalFAglio et alJIIOOSi) and the feature 
is not detected. However, the methods used may not be totally 
appropria te and in the present paper, we come back to this point. 
Note that lFaucher-Giguere et al.l(l2008l) cautioned that (i) the dip 
interpretation is only valid if one insists on fitting a single power 
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law to the background evolution, and there is no clear physical 
motivation for doing so and (ii) this feature is not necessarily due 
to He II reionization and other interpretations are also possible. 

The definition of the unabsorbed quas ar continuum over 
the Lyman-g fo rest is a critical issue (e.g. iTvtler et al] l2004t 
iKim et ai]|2007h . For low resolution spectra, most analysis de- 
fine first a continuum redwards to the QSO Lyman-a emis- 
sion, where there are only few absorption lines, and extrapo- 
late the shape of the continuum in the Lyman-a forest region 
(see Section lTTI for more details). It is most commonly assumed 
that the QSO continuum in regions where there is no emission 
line is a power-law that can be extrapolated easily. However, 
this assumption usually neglects weak emission lines both in 
the red and, more im portantly, within the Lyman-o' forest re- 
gion. Because of this. I Suzuki et al.l (l2005l S05) have applied a 
Principal Component Analysis (PCA) to HST spectra of quasars 
at z < 1. Quasar continua are described with a limited set of 
eigenvectors and a controlled sample is used to define a projec- 
tion matrix allowing to recover the continuum in the Lyman-a 
forest from the shape of the continuum in the red part. 

The shape of quasar continuum can evolve from z < 1 to 
z ~ 3. In such a case, a PCA at z < 1 would not give a fair rep- 
resentation of quasar continuum at z ~ 2 - 3. In this paper, after 
describing the procedures in Section |2] we take advantage of the 
large database provided by SDSS-DR7 to define a large enough 
sample of quasars at z ~ 3 on which we can apply the same pro- 
cedure as in SOS (Section O. New eigenvectors and projection 
matrix are generated and then used to predict the continuum of 
all SDSS-DR7 spectra. We apply the method to the determina- 
tion of the evolution with redshift of the mean flux in the Lyman- 
a forest (Section |4]l and discuss the significance of the bump at 
z ~ 3.2 - 3.4 before drawing our conclusions in Section |5] 

2. Procedures 

2.1. Different methods to estimate the QSO continuum in low 
resolution spectra 

The methods used to estimate the continuum in low resolution 
spectra can be broadly classified as below: 

1 . A direct estimate of the continuum in the Lyman-a region: 

- Using a spline interpolation: A cubic spline is inter- 
polated on adaptative intei-vals between observed data 
points in the forest to constract a local continuum. A 
correction is then applied to take into account the fact 
that these data points can be affected by some absorp- 
tion. iDalFAglio et al.l (|2008, 2009) apply a systematic 
correction which accounts for resolution effects and line 
blending and is estimated from idealized Monte-Carlo 
simulated spectra. This approach however neglects the 
possibility of continuous absorption from the smooth 
IGM (rather than discrete absorbers) as well as cor- 
relations from large-scale structure. At high redshift, 
continuous absorption can be important and cause the 
true continuum to be underestimated even after apply- 
ing the Monte-Carlo method to high-resolution data, 
f aucher-Giguere et al. (2008) developped an alternative 
method to correct the continuum placement using cos- 
mological simulations and showed that this effect is ac- 
tually important at the >10% level at z = 4. 

- Taking into account the difference between the con- 
tinuum an d absorpti on wavelength depen dencies (e.g. 
iBemardi et al. 2003: iProchaska et al.ll2009l) : The contin- 
uum is a property of the quasar and depends only, to first 



order, on the restframe wavelength (/li — A/(l + Zem)); 
while the absorption depends on the redshift (zabs = 
/I/ 12 15. 6701 - 1) only. Thus, if one separates the de- 
pendencies in the flux, F(Ar,z) = C(/li) exp(-r(z)), both 
quantities can be recovered in principle. 
2. Using the red part of the QSO spectrum to predict the blue 
part, 

- A power law is adjusted to the red part of the spec- 
trum in regions free of emission and absorption lines (see 
Section l272l i. The power law is then simply extrapolated 
over the Lyman-a forest wavelength range. This proce- 
dure does not account for the presence of weak emission 
lines in the Lyman-a forest region. 

- A Principal Component Analysis applied to a reference 
sample describes the continuum of a quasar spectrum as 
a linear combination of eigenvectors. A projection ma- 
trix is generated and used to translate the weights of the 
eigenvectors describing the red side of the spectrum into 
the weights of the eigenvectors for the whole spectrum. 

In this paper, we focus on the two last methods in which 
information provided by absoiption-free regions redwards to the 
Lyman-a emission line are used to predict the continuum in the 
forest. Both methods require first to estimate the quasar emission 
redshift. Our procedure to estimate this redshift and to provide 
a power-law continuum is described in the next sub-section. We 
then describe in Section 2.3 the PCA method. 

2.2. Determination of the position of the emission lines and 
the power-law continuum 

We derive a redshift of the quasar using C iv and C m] emission 
lines in order to be able to estimate the position of these lines 
and to avoid them when fitting the power-law. This is therefore 
not an attempt to derive the exact systemic quasar redshift which 
is known to be shifted compared to the C rv-C ml red shift (e.g. 
IVanden Berk et al" ] |200lT:lHennawi & Prochaskall2007l) . 

Here we assume that the continuum redwards to the Lyman- 
a emission line can be described as the sum of a power-law 
component and a gaussian function for each of C iv and C iii] 
emission lines. The redshift and the power-law component are 
estimated as follows (see Fig. [T] for a typical example at Zem = 
2.862): 

1 . After convolving the whole spectrum with a gaussian filter 
of fixed FWHM = 250 km s the position where the flux is 
maximum is associated to the QSO Lyman-a emission line. 
This gives a first rough estimate of the redshift, zi . 

2. The average of the positions of the maximum flux within a 
window of 100 A in the rest frame of the quasar (at z = 
Zi) around C iv and C iii] emission lines provides a second 
redshift estimate, Z2- This redshift should be more accurate 
than zi because the peak of the Lyman-a emission is a poor 
estimate of the redshift due to the presence of the Lyman-a 
forest and the blending with N V/11240. 

3. A power-law component of the continuum is fitted using 
windows devoided of emission lines between 1430-1500 A, 
1600-1830 A and 2000-2500 A in the rest-frame (see grey 
windows in Fig. [T]i. This component is subtracted from the 
spectrum. 

4. Finally, C iv and C iii] emission lines are simultaneously fit- 
ted with gaussian functions. The width and amplitude are 
independant parameters, but the two gaussian functions are 
bound to have the same redshift, Z3. 
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Fig. 1. Illustration of the method used to estimate the emission redshift and power-law continuum of quasars (the quasar shown 
is SDSS J012156. 03+144823. 9). Grey areas indicate the regions used to fit the power-law and the red line is the estimate of the 
continuum (power-law + C iv and C iii] emission lines). Vertical dashed lines indicate the position of emission lines. 



The red line in Fig. [T] is the sum of the power law and the two 
emission lines. The extrapolation of the power-law component 
bluewards to the Lyman-o- emission provides a first estimate of 
the quasar continuum (red line in Fig.[l}. 



2.3. Principal Component Analysis 

We summarize here the main steps of t he me thod as described in 
iFrancis et al.l (Il992h and lSuzuki et"an (l2005l) . 

2.3.1. Reconstructed continuum 

A representative sample of quasar spectra at the redshift of in- 
terest must be gathered for which it is possible to define a true 
continuum, qiA), e.g. unspoiled by intervening absorption. 805 
used HST spectra at z < 1 because the IGM is sparse at these 
redshift and thus the continuum can be easily interpolated above 
absorption lines. The sample of SDSS-DR7 quasars we used has 
a mean emission redshift of z ~ 2.9 and is defined in Section |3] 
We derived the true continuum, qiA), by eye and used these fitted 
quasar continua in the following. 

A covariance matrix V is first calculated for the N QSOs in 
the sample as: 

i=l 

where fi(A) - 1/N qi(A) is the mean quasar continuum. 

The principal components are found by decomposing the co- 
variance matrix V into the product of the orthonormal matrix P 
which is composed of eigenvectors, and the diagonal matrix A 
containing the eigenvalues; 



We call the eigenvectors (i.e. the columns of the matrix P) the 
principal components, ^j. The principal components are ordered 
according to the amount of variance in the training set they 
can accommodate, such that the first principal component is the 
eigenvector which has the largest eigenvalue. 

The distribution of the weights, cj, of the jth principal com- 
ponent in Eq. |4] is found from the distribution of the Cy for all 
= 1 ..A^ QSOs of the sample: 

o2000A 

(qi(A) - fi(A)) ^j(A) dA. (3) 

J1020A 

Note that the upper limit of the integration is larger here co m- 
pared to SOS. This is discussed further in Section|3] A mock con- 
tinuum can now be constructed using Eq. |4]over the rest frame 
wavelength range of the spectra in the sample. 

m 

q{A)~fi(A) + Y,c,^i{A)' (4) 

j=i 

2.3.2. Predicted continuum 

The goal is to quantify the relationship between the red and blue 
sides of the spectra in the sample. The first m principal com- 
ponents ^j{A) and their weights, Cy, are derived as described 
above, using the whole rest wavelength range, 1020 to 2000 A. 
Another set of m principal components, (j{A) and their weights, 
dij, are defined using only the red rest wavelength range, 1216 
to 2000 A. Finally, we solve linear equations to find a projection 
matrix relating cij and dij. Weights can be written in the N x m 
matrix form C = Cij and similarl y for D. We then use singular 
value decomposition techniques ( iPress et alll 19921) to derive the 
m X m projection matrix X - Xy translating weights found with 
the red side only into the weights for the whole spectrum: 



y = p ' X A X P. 



(2) 



C ^ D X. 



(5) 
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Once matrix X is known, we can estimate the continuum over 
the Lyman-Q' forest for any quasar spectrum from the red part of 
the spectrum. We proceed in three steps. The weights for the red 
spectrum are found, 

^2000A 

Z7j= (qiA)-tiiA))(,iA)dA. (6) 

J1216A 

The weights from the red side bj are translated to weights for 
the whole spectrum, using 

m 

flj = ^^kJCkj- (7) 

k=l 

Then the continuum for the whole spectrum is built as: 

m 

piA)=fi(A) + Y^a,^iiA). (8) 
j=i 

3. New PCA continuum at z ~ 3 

The eigenvectors and coefficients as derived by 805, at low 
redshift, have been used to generate mo ck spectra in order to 
test different analysi s at high redshift ( Dall Aglio et al.ll2008t 
iKirkman etaT]|2005h . To our knowledge, one attempt has also 
been made to derive a similar decomposition at high redshift 
using SDSS spectra (McDonald et al. 2005). The latter authors 
comment that this continuum determination is robust enough to 
infer the mean flux evolution, but unstable as far as the Lyman-a 
power spectrum is concerned. However the decomposition is not 
performed on a well controlled training sample as in S05 and in 
the present work (see below), therefore, they d o not provide a 
projection matrix. The analysis performed by lYip et al.l (|2004|) 
is closer to our purpose. They applied a PCA to the 16,707 
Sloan Digital Sky Survey DRl quasar spectra (0.08 < z < 5.41) 
and reported that the spectral classification depends on redshift 
and luminosity. No compact set of eigenspectra succeeds in 
describing the variations observed over the whole redshift 
range. Besides, it seems that there is a differential evolution with 
redshift of the coefficients. Since these authors are interested 
in the quasar continuum only, they do not try to recover the 
exact continuum over the Lyman-a forest, and consider only the 
observed flux (that is continuum plus absorption). 

Due to the large difference of redshift between the quasars 
used in S05 and those involved in any Lyman-a forest study us- 
ing SDSS spectra, one may wonder if there is any evolution in 
the continuum of quasars or any change in the correlation be- 
tween the shapes of the continuum over the forest and redwards 
to the Lyman-a emission compared to what is found by S05. 
Deriving new components at redshift 3 should answer this ques- 
tion. 

The other motivation for this work is to provide principal 
components and distributions of coefficients over a larger wave- 
length coverage than in previous studies: the S05 matrix allows 
to generate continua from Lyman-/? to C iv emission lines while 
in the present work we will extend the wavelength coverage be- 
yond the C III] emission line (until 2000 A in the restframe). This 
should in principle facilitate the extrapolation in the blue. 

3.1. Deriving new principal components from a sub-sample 
ofz ~ 3 SDSS-DR7 quasar spectra 

The difficulty of this analysis is to estimate the continuum in the 
Lyman-a forest, where absorption can be neither neglected nor 



easily removed because of the low resolution of SDSS spectra. In 
particular it would be very difficult to define the true continuum 
automatically. 

We first selected spectra with a signal-to-noise ratio per 
pixel greater than 14 redwards to the Lyman-a emission (the 
SNR is computed around 1280 A in the restframe). We require 
the redshift of the quasars to be greater than 2.82 and lower 
than 3.00. The lower limit is chosen as such for two reasons: the 
Lyman-a forest has to be complete (z > 2.7) and we noticed 
that there is some issues with the flux calibration at the very 
blue end of the SDSS spectra so that we would like to avoid 
this part of the spectrum. To illustrate these problems and 
estimate the exact wavelength at which to start the study, we 
have selected spectra where a damped Lyman-a system (DLA) 
is observed with aborption redshift greater than 3.7 and with 
a column density NfH i) > 10^'^^cm"^ (the list is available 
in iNoterdaeme et al.l l2009l) . In these spectra, and due to the 
presence of the DLA, the flux is expected to be equal to zero 
for /lobs ^ 4280 A. When stacking the selected lines of sight 
(Fig. |2]i, we note instead that the flux increases for wavelength 
lower than 4000 A. The difference with zero is as high as 
0.05 for a normalized spectrum, meaning that this part of the 
spectrum should probably not be used for the analysis. Since the 
study of the forest is usually limited to beyond the O vi emission 
line (/liest > 1050 A), the minimum emission redshift will be 
2.82. The upper limit is a compromise between the number 
of spectra needed for the analysis and our ability to estimate 
confidently a continuum by eye: it has been set to z - 3. BAL 
quasars, lines of sight containing a DLA or any spectrum for 
which fitting a continuum is too risky because of missing pixels 
or reduction issues are removed from the analysis. The SDSS 
spectra are observed with two cameras, one for the blue and one 
for the red part of the spectrum and we were concerned about 
the presence of a possible discontinuity or break at the merging 
point. We have avoided any spectra possibly affected by this. 
When applying all those constraints, 78 spectra remain in the 
training set (Table [U. 

Once the training set is defined, spectra are smoothed and 
the continuum is "hand-fitted". Redwards to the Lyman-a 
emission line, we follow the different emission lines and ignore 
isolated absorption lines. In the Lyman-a forest, the continuum 
cannot be uniquely defined. We assume that the continuum 
has a smooth shape, that it roughly follows the peaks of the 
spectrum, and that the blending of lines at the SDSS resolution 
is large. Points are placed around the peaks of the flux, and a 
spline interpolation is used to connect them. After a first try, 
we minimize the number of points used and we check that the 
continuum is indeed located above blends of lines, but at about 
the level of 'flat' regions. A typical example is given in Fig. [3] 
Note that with this procedure, we de facto take into account that 
the chosen points can be affected by some absorption. To check 
that this is indeed the case, the spectra are then rebinned with 
0.5 A restframe pi xels and the mean flux evolution is computed 
and compared to IFaucher-Gi^uere et al.l (l2008l) measurements 
from high and medium resolution and high signal-to-noise 
spectra. Both estimates are in agreement giving us confidence in 
our hand-fitted continua (Fig.|4|i. 

The procedure described in Section 12.31 is then performed 
and the coiTelation matrix is computed and displayed in Fig. |5] 
In agreement with S05, a moderate correlation (0.3-0.6) is 
found between the shape of the continuum in the forest and the 
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J163842 


52+360213.8 


2.910 


19 


36 


13.18 


19.67 


J120322 


71+403310.1 


2.915 


19 


14 


11.10 


15.88 


JO 12305 


58+063047.2 


2.923 


19 


07 


11.99 


16.07 


J135225 


88+293830.4 


2.915 


18 


43 


9.32 


14.35 


J 102807 


74+172956.8 


2.926 


18 


11 


30.59 


41.45 


J 100808 


27+285214.6 


2.921 


18 


53 


10.11 


14.56 


J130554 


81+184904.9 


2.925 


18 


27 


13.79 


21.28 


J152119 


68-004818.7 


2.934 


17 


93 


19.39 


25.50 


J075326 


11+403038.6 


2.929 


17 


92 


13.80 


20.90 


J103928 


63+253345.4 


2.933 


18 


69 


1 1.40 


14.90 


J004129 


80+241702.1 


2.934 


19 


32 


12.94 


18.51 


J223927 


69+230018.0 


2.928 


18 


41 


14.47 


19.71 


J 160441 


47+164538.3 


2.932 


16 


91 


29.71 


38.00 


J 102025 


27+334633.4 


2.930 


18 


21 


13.54 


18.48 


J 12075 3 


79+325747.4 


2.945 


18 


89 


9.92 


14.25 


J141442 


96+193523.5 


2.946 


18 


32 


1 1.46 


15.61 


J131212 


60+001129.7 


2.945 


19 


27 


9.76 


16.18 


J082257 


04+070104.3 


2.940 


18 


65 


11.05 


15.02 


Jl 11038 


63+483115.6 


2.953 


16 


79 


26.22 


33.02 


J130337 


21+194926.7 


2.953 


17 


84 


16.79 


22.49 


J 10425 3 


44-001300.8 


2.953 


18 


87 


11.98 


17.1 1 


J134811 


76+281801.8 


2.966 


17 


57 


26.29 


39.35 


J091546 


67+054942.7 


2.967 


18 


40 


9.35 


14.37 


J 125708 


23+191857.2 


2.970 


18 


61 


19.54 


27.02 


Jl 13559 


41+422004.4 


2.953 


18 


57 


9.35 


14.38 


J090423 


37+130920.7 


2.968 


17 


59 


22.46 


30.62 


J120331 


29+152254.7 


9 Q7fi 

/ U 


16 


99 




^9 71 


J142807 


87+162634.3 


9 077 


18 


32 


1 9 on 


i o.uu 


JO 13829 


75+224558.9 


2.987 


19 


11 


10.03 


14.84 


J085959 


14+020519.7 


2.970 


18 


45 


10.52 


15.43 


J132255 


66+391207.9 


2.984 


17 


76 


19.45 


25.44 


J 120006 


25+312630.8 


2.978 


16 


62 


29.81 


34.04 


J132321 


24+250027.4 


2.980 


18 


00 


11.72 


15.97 


J125419 


07+362750.4 


2.980 


18 


51 


13.68 


17.43 


J074313 


86+28442.3 


2.975 


19 


45 


12.31 


14.84 


J143912 


34+295448.0 


2.992 


17 


65 


17.65 


24.10 


J075710 


36+362301.5 


2.990 


18 


79 


11.63 


15.36 


J224154 


38+000102.1 


2.998 


19 


12 


9.57 


14.26 



Table 1. List of SDSS-DR7 quasars used to define the correlation matrix and the eigenvectors at z ~ 3 



region between Lyman-o' and C iv emission lines. Thanks to the 
larger restframe wavelength coverage of this study, a moderate 
anti-correlation (from -0.6 to -0.4) between the shape of the 
continuum in the forest and in the region between C iv and C iii] 
emission lines is found. 805 noticed that a PCA continuum 
has the good shape in the forest but that the amplitude of the 
power-law component is unstable. The anti-correlation in that 
extra-part of the continuum may improve the stability of the 
prediction of the amplitude of the continuum in the forest. This 
is discussed in more details in Section [3.2.1l 

The mean continuum is shown in Fig. |6] together with the 
mean continuum of 5 05 (z < 1) and the composite spectrum 
derived by Vanden Ber k et aP (^001). In the wavelength range 
of inte rest here, quasars contributing to the IVanden Berk et al.l 
(1200 ll) composite cover a redshift range from 2.13 to 4.789 for 
the Lyman-a region and from 1.5 to 4.789 for C iv. Our mean 
continuum is in excellent agreement redwards to the Lyman-o- 
emission line with the Vanden Berk et al. ( 2001) composite. The 
discrepancy in the blue is simply due t o the absorption in the 
Lyman-a forest that I Vanden Berk et al.l (12001 1) did not try to re- 



move. When comparing to S05, one can see that the amplitude 
of C IV, Lyman-Q' and Lyman-y6 emission lines relative to the 
continuum are less important in the SDSS spectra. While the de- 
termination of the C IV emission is relatively straightforward, the 
presence of absorption at the position of the Lyman-/? emission 
line and in the blue wing of the Lyman-ff emission line makes 
the continuum difficult to estimate. Thus, the main and robust 
difference between the mean continua in SDSS and HST spectra 
is the variation of the C iv equivalent width. Such an evolution 
of the QSO emission line equivalent widths has been noted for 
long ([Baldwinill977 ) and has also been reported by Zheng et al.l 
(1 19971) . The later authors used 101 HST spectra to compute a 
low-z composite spectrum (90% of t he quasars had a re dshift 
lower than 1.5) and compared it to thel Francis et al.l (1 19911) com- 
posite (z ~ 3). This evolution is probably rela ted to the q uasar 
luminosity. Note that no evolution is found by iFanI (|2009|) from 
z ~ 2 to z > 6. 

Further, the first ten eigenvectors are displayed in Fig. 
when derived from the full wavelength coverage and in Fig. |8] 
when derived from the region redwards to the Lyman- a emis- 
sion, together with the components provided by S05. The distri- 
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J134826.65+290623.0 
z = 2.822 
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Restframe wavelength (A) 
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Fig. 3. Spectrum and continuum of the quasar SDSS J 134826.65 +290623.0. This quasar belongs to the sample of SDSS z ~ 3 
quasars that is used to derive the Principal Component Analysis eigenvectors (Section |3]l. Our estimate of the continuum is shown 
with the thick red line and a zoom in the Lyman-a forest region is shown in the inset. 




0.8 



s 



2 0.6 



Wavelength (A) 

Fig. 2. Result of the stacking of SDSS-DR7 spectra with a 
damped Lyman-a system at an absorption redshift larger than 
3.7 and a column density A^(H i) > lO^^^cm^. The spectra are o.4 
normalized to 1 near 1280 A (in the quasar restframe). Due to 
the presence of the DLA, the flux is expected to be equal to 
zero at observed wavelength smaller than 4280 A. It can be seen 
from the figure that this is not the case in the very blue of the 
spectrum (/lobs ^ 4000 A ) where the mean flux is increasing. 
Consequently, pixels at wavelengths below 4000A are not used 
in this analysis. 



4 Training set (hand-fitted continuum) 
o Faucher-Giguere et al. (2008) 



2.5 



3.0 

Redshift 



3.5 



4.0 



butions of the coefficients, Ci j and dij, computed on our sample 
using Eq.[3] are also shown. The first component looks very sim- 
ilar in the two decompositions and is dominated by the amplitude 
of the Lyman-ff and C iv emission lines. The most important dif- 
ference with SOS lies in the shape of the distribution of the as- 
sociated coefficients: in their study, the distribution is Gaussian 
whereas in ours, this distribution is log-normal. This trend (con- 
cerning the eigenvectors and the distributions of coefficients) is 



Fig. 4. The mean flux evolution in the training set (with redshift 
bins of size Az - 0.1; red triangles) is compared to FG08 mea- 
surements from high resolution and high signal-to-noise spectra 
(black circles). The continuum of the 78 SDSS spectra in the 
training set have been fitted by hand using spline interpolation. 
Error bars from our measurements have been computed by boot- 
strapping pixels of our sample. Both measurements are consis- 
tent, making us confident in our continuum estimate. 



in agreement with what has been found bv lFrancis et al.l (Il992b . 
The discrepancy between the shapes of the coefficient distribu- 
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Fig. 5. Correlation matrix computed with the training set. A moderate correlation (0.3-0.6) is found between the shape of the 
continuum in the Lyman-ff forest ( 1 020 < A <1210A) and in the region between Lyman-a and C iv emission lines (1216</1<1 600 
A), in agreement with 805. We note also a moderate anti-correlation (from -0.6 to -0.4) between the continuum in the Lyman-or 
forest and the region further to C iv emission line. 



tions could mean that the 805 sample is more homogeneous than 
ours in term of amplitude of emission lines. 
The second component is dominated by the continuum slope and 
it seems that there is a difference between what is found here and 
in 805 which seems to be somewhat compensated by the dif- 
ference in the third component. Other components show small 
differences but they are less pronounced and the coefficient dis- 
tributions are very similar 

To estimate more quantitatively the similarit y of the low an d 
high redshift sets of eigenspectra, we follow lYip et al.] (|2004|) . 
and compute the sum of the projection operators of each set of 
eigenvectors |^j >: 



j— l,m 



(9) 



and then the trace of the products of the projection operators: 
rr(S,=3Sso5E,=3) = D , (10) 

where D will be the common dimension of both sets. The two 
sets are disjoint if the trace is zero. If the basis are completely 
alike, D should be equal to their dimension, therefore D = 10 
in our case. To compute this number, we cut our eigenspectra 
at 1600 A (rest) and we find D = 7.6. This means that the two 
decompositions are similar but not exactly the same, confirming 
the slight evolution of the decomposition with redshift. 

We provide in an electronic form the first 10 eigenvectors 
of the PCA. The distributions of associated coefficients are very 
close to gaussian functions, except for the first coefficient, the 



Component j 




c,j distribution 










cr 




1 


2.645 


0.008 


0.145 


0.006 


2 


0.090 


0.033 


0.845 


0.033 


3 


-0.220 


0.019 


0.745 


0.019 


4 


0.044 


0.031 


0.717 


0.031 


5 


-0.029 


0.003 


0.496 


0.004 


6 


0.028 


0.002 


0.386 


0.003 


7 


0.020 


0.003 


0.363 


0.004 


8 


0.018 


0.002 


0.258 


0.008 


9 


0.034 


0.002 


0.284 


0.006 


10 


0.008 


0.001 


0.336 


0.001 



Table 2. Parameters of the fits to the distributions of weights, 
Cij, for the first ten principal components. The distributions have 
been fitted with gaussian functions, except for the distribution 
of the first coefficient which has been fitted with a lognormal 
function. 



distribution of which is fitted with a log-normal distribution. 
Their characteristics are listed in Table |2] 



3.2. Quality of the predicted continuum 

The decomposition of the quasar emission investigated at z < 1 
by 805 with H8T spectra and at z ~ 3 in this paper with 8D88 
spectra yield two similar but different basis of eigenvectors, as 
shown in the previous 8ection. One would like to know if the 



8 



Paris et al.: PCA of QSO UV spectrum at z ~ 3 




1500 

Restframe wavelength (A) 



2000 



Fig. 6. The solid black (resp. dashed red) line is the mean continuum of quasars at z = 3 (resp. z < 1, 805). The wavelength coverage 
at z = 3 corresponds to SDSS spectra and is larger than at z < 1. The main difference between the two mean spectra is seen 
in the amplitude of emission lines. The composite spectrum from Vanden Berk et al. (2001) computed from 2,200 SDSS spectra 
(dash-dot blue hne) is i n good agreement with our mean continuum. The difference in the Lyman-a forest is due to the fact that 
IVanden Berk et al] (12001) did not ti^ to avoid the absorption from the IGM. A small shift in the position of emission lines can be 
noticed: this is because Vanden Berk et al ] (l2001h have used the Mg ii line as a reference to compute the redshift whereas we have 
used the C iv and C m] lines. 
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Fig. 7. First ten principal components of a Principal Component Analysis applied to (i) z ~ 3 SDSS quasar spectra (black solid 
lines) over the range 1020-2000 A and to (ii) z < 1 HST quasar spectra (SOS, red dashed lines) over the range 1020-1600 A . The 
distributions of the coefficient associated to each component are shown in the right panel (grey histogram) together with their fit 
with a Gaussian (except for the first component for which the distribution is log-normal; thick black line). 
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Fig. 8. First ten principal components obtained from a Principal Component Analysis of the QSO spectrum redwards to the Lyman- 
a emission line, for (i) the z ~ 3 SDSS quasar spectra (black solid lines) and (ii) the z < 1 HST quasar spectra (S05, red dashed 
lines). The distributions of the coefficient associated to each component are shown in the right panels (grey histogram) together with 
a Gaussian fit (except for the first component for which the distribution is log-normal; thick black line). 



larger wavelength coverage of our eigenvectors provides any ad- 
vantage and how far the new determination is required to repro- 
duce the correct quasar continuum at z ~ 3. In other words, is 
the prediction of quasar continuum at z ~ 3 better if one uses 
the new PCA eigenvectors derived in this paper ? To answer this 
question, we apply three tests to the predicted continua. 

3.2.1 . Error on the predicted PCA continuum in tine Lyman-o- 
forest 

In order to estimate the difference between the true and predicted 
continuum in the Lyman-a forest, a set of eigenvectors and a 
projection matrix are derived using 77 spectra of the training set 
(out of 78) and the continuum over the Lyman-a forest for the 
remaining quasar is estimated using these parameters and fol- 
lowing the method described in Section |23] That procedure is 
repeated on each of the 78 spectra in the training set. 
Following SOS, we estimate for each spectrum, the absolute frac- 
tional flux eiTor \6F\, defined as follows. 



p(A) - q(A) 



q(A) 



dA 



r 



dA, 



where p and q are, respectively, the predicted and real continua. 

This has been computed over the restframe wavelength 
ranges, 1050-1170 A in the forest, and 1280-2000 A (or 1280- 
1600 A) in the red. The cumulative distributions of the absolute 
fractional flux errors are plotted in Fig. |9](in the red) and Fig.fTOl 
(in the forest) using two different wavelength coverages (black 
Une for 2000 A and grey line for 1600 A). 
In the red, the median error is 5.8% and 5% using, respectively, 
the 2000 A and the 1600 A decompositions (Fig.|9] black dashed 



and grey dotted vertical lines respectively) and the 90 per- 
centile error with the 2000 A decomposition is larger than the 
error using the 1600 A set of eigenspectra. This indicates simply 
that the range up to 1600 A is easier to fit. 

The important result is that the opposite trend is observed in 
the forest (Fig. [TOll: median values (around 5%) are very close 
for the two wavelength coverages (black and grey dashed ver- 
tical lines) whereas the 90''' percentiles are different (black and 
grey dot-dot-dash vertical lines) with 9.5% and 12.4% errors for, 
respectively, the 2000 and 1600 A decompositions. This means 
that using the full coverage reduces the number of outliers with 
more than 12% error in the Lyman-ff forest. Errors in S05 are 
similar to those we find here but, by using the 2000A decom- 
position, outliers are less frequent than in the previous study. To 
illustrate what those numbers mean on the continuum level, ex- 
amples of spectra with their predicted continua are displayed in 
Fig.III] 



J-) 3.2.2. Distribution of spectral indices 



We can also compare the characteristics of mock continua 
generated from the set of computed eigenvectors and the 
distribution of weights given in Table |2] with those of 
real spectra from SDSS-DR7. For this, we have fitted in the 
same way a power-law to mock continua and to SDSS-DR7 
spectra (see Section 14.21 for more details). The normalized 
(sum equal 1) distributions of the derived power-law index are 
displayed in Fig. [12] The distribution from mock spectra is 
more peaked than the one from SDSS spectra but is centered 
around the same value. This behavior is expected because the 
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Fig. 9. Cumulative distributions of the absolute fractional flux 
error (Eq. [TTt redwards to the Lyman- a emission line when 
predicting spectra with PCA decompositions over a wavelength 
range extending up to 2000 A (black solid line) or up to 1600 A 
(grey solid line). Median values are similar for the two estimates 
(black and grey dashed vertical lines respectively). The 90''' per- 
centile from the 1600 A decomposition (dash-dot-dot grey verti- 
cal line) is lower than one for the 2000 A decomposition (dash- 
dot-dot black vertical line). 
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Fig. 10. Same as Fig. |9] for the cumulative distributions of the 
absolute fractional flux error (Eq. [TTt in the Lyman-ff forest. 



PCA gives us a mean description of the whole quasar population. 



3.2.3. Prediction of tine continuum: fitting coefficients versus 
using tine projection matrix 

Our goal is to estimate the quasar continuum over the Lyman-a 
forest. For this, we estimate first the weights of the red part of the 
spectrum in the basis obtained from PCA of the red part of the 
spectra. We then multiply these weights by the projection matrix 
(see Eq. 5) to compute the weights to be used in the basis ob- 




1200 UOO 1600 1800 2000 

Wavelength (A) 

Fig. 11. Example of a spectrum with (i) a large absolute frac- 
tional flux error in the red and a small error in the forest (upper 
panel) and (ii) a large absolute fractional flux error in the forest 
and a small error in the red (lower panel). The hand-fitted con- 
tinuum is the red dashed line and the predicted one is the solid 
blue line. 



tained from the overwhole spectrum. This gives us the spectrum 
reconstructed in the whole wavelength range (method 1). One 
may be tempted to directly use the coefficients obtained from 
the red part of the spectrum as representative of the whole spec- 
trum just replacing the eigenvectors obtained from the red part 
by those obtained from the full wavelength coverage (method 
2). To test how useful the projection is, both methods have been 
used to predict the continuum and the distributions of absolute 
fractional flux eiTor (Eq. [TTt have been computed and are dis- 
played in Fig. [13] and Fig. [14] 

When using method 2, and not surprinsigly, the red part of 
the spectrum is very well fitted with a median eiTor less than 
2.5% (Fig. [13] dashed grey line) and method 1 (projection) leads 
to larger errors (dashed black line in Fig. [13] median error ~6%). 
In the forest, the trend is opposite with a median error less than 
5% when method 1 is applied (Fig. [14] dashed black line) and 
more than 7% when method 2 is applied (dashed grey line). In 
addition, with method 2, 10% of the spectra have more than 15% 
error (dot-dot-dash grey line) for only one percent with method 
1. It is therefore apparent that the projection matrix should be 
used. 
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Fig. 12. Distribution of spectral indices derived by fitting a 
power-law to mock continua generated using the principal com- 
ponents and the distributions of coefficients Cij derived from our 
SDSS-DR7 subsample (grey histogram) compared to the spec- 
tral index distribution obtained from fitting SDSS-DR7 spectra 
(black histogram). 
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Fig. 13. Cumulative distributions of the absolute fractional flux 
error in the red part of the spectra in the training set: (i) when 
the weights used to reconstruct the spectrum are those obtained 
from the red part of the spectrum (method 1, grey line); (ii) when 
the projection matrix is used (method 2, black lines). Median 
values are displayed (dashed vertical lines) together with 90''' 
percentiles (dot-dot-dashed vertical hnes). 



4. Evolution of the mean flux 

An important application of the quasar continuum estimate 
over the Lyman-a forest wavelength range is the determi- 
nation of the redshift evolution of the mean flux in the 
IGM. Numerous authors have performed this measurement 
using high and/or intermediate spectral resolution data (e.g. 
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Fig. 14. Same as Fig. [13] in the Lyman-a forest. Errors are less 
when the projection matrix is used (method 2, black line, see 
Text). The number of outliers (spectra with large errors) is much 
smaller in that case. 



Songaila 2004; Bernai'di et al. '2003i iDalFAglio et~an l2008l 
2009:.Faucher-Giguere et al.u2008 ). The evolution is smooth ex- 



cept for a possible bump at z ~ 3.2 which could be related to the 
He II reionization (Schaye et al. 2000), although this is not the 
only possible explanation (Faucher-Giguere et al. 2008.) . In this 
Section we reinvestigate this issue applying the method devel- 
opped in Section[3] 



4.7. Comparison with B03 



41 



We fir st would like to check if we can recover the Bemardi et 
( l2003h results. The sample used in the Bernardi et al. (2003j 
study is a sub-sample of SDSS-DR7 containing all the spec- 
tra observed up to the end of 2001 (corresponding to a modi- 
fied Julian day mjd - 52274). Some selection is applied to re- 
move most prominent BALs and DLAs. These objects are not 
clearly defined in Bernardi et al. (2003) so that we have to apply 
our own selection. We avoid all BALs and DLAs as defined by 
Noterdaeme et al. (2009). Our final sample has 837 QSOs when 
B03 had 1041. The comparison to B03 is shown in Fig. [TS] (left 
hand-side panel) for power-law (triangles) and PCA (squares) 
esti mates of the continuu m. Shown in the figure as well are 
the lFaucher-Giguere et al.l (12008) results. As expected (see next 
Section), the power-law estimate is lower at z < 3 than other es- 
timates. The evolutions found by us and B03 are in agreement 
and a departure from a smooth evolution is seen at z ~ 3.2. 

We have randomly drawn from SDSS-DR7 a large num- 
ber (500) of samples identical in size and redshift distribu- 
tion to the B03 sample. For each sample, we derived the mean 
flux observed at each redshift and calculated the mean over 
the 500 samples. The result is shown in Fig. [15] (right hand- 
side panel). The feature at z ~ 3.2 is still seen and could be a 
"bump" or a break in the evol ution. Note that, as emphasized by 
iFaucher-Giguere et al.l (l2008l) . a bump is seen only if one insists 
on fitting a single power-law. 



12 



Paris et al.: PCA of QSO UV spectrum at z ~ 3 



0.8 



A 
V 

s 



s 

01 



0.6 



0.4 



□ □ 



Bemardi et al. (2003, B03) 
o Faucher-Giguere et al. (2008) 
A B03 sample - PL continuum 
□ B03 sample - PCA continuum 

_l I \ \ I I I \ L 



A 
S 



a 

CS 
01 



2.5 



3.0 

Redshift 



3.5 



4.0 



0.8 



0.6 



0.4 



• oh 

•oo 



□ □ □? 



V 



Bernardi et al. (2003, B03) 
o Faucher-Giguere et al. (2008) 
□ B03-like samples - PCA continuum 

—1 \ I I I I I I I I 



2.5 



3.0 

Redshift 



3.5 



4.0 



Fig. 15. Left hand-side panel: Mean flux redshift evolution inferred from power-law (red triangles) and PCA (blue squares) estimates 
of the continuum using a sample similar to Bernardi et al. (2003); the mean flux evolutions reported by Bernardi et al. (2003) and 
iFaucher-Giguere et al. (2008) are shown as, respectively, grey points and black open circles. Our measurements are consistent with 
B03 and FG08 results. In particular, a departure from a smooth evolution is seen at z ~ 3.2. Right hand- side panel: 500 samples are 
randomly drawn from the SDSS-DR7 (similar in size and redshift distribution to the B03 sample) and the mean flux evolution for 
each sample is then computed. The average of these measurements is displayed (PCA continuum, blue squares) and is in excellent 
agreement with .Faucher-Giguere et al.i(,2008i) measurement. 



4.2. Redshift evolution of ttie mean flux in the IGM using 
SDSS-DR7 

Continua of SDSS-DR7 spectra have been fitted using the two 
different methods we have described previously. We derive a 
power-law and a PCA (using our and S05 sets of eigenspectra) 
continua. We restrict our study to spectra with a signal-to-noise 
ratio greater than 8 around 1280 A in the restframe to avoid in- 
stabilities in the fit of the power-law. We restrict the analysis to 
quasars with redshifts larger than z > 2.45 to avoid the blue end 
of the spectra. 

Lines of sight containing damped Lyman-a systems 
(DLAs) have been remo ved following the lis t s prov ided by 
iNoterdaeme et all (l2009l) and IProchaska et al.1 (l2()05h Broad 
absorption Une quasars (BALs) flagged bv IShen et aP (l2010l) 
have been avoided as well. After this selection, we are left with 
2,576 quasars. Following Bernardi et al. (2003), we compute the 
mean flux from the Lyman-ff forest between 1080 and 11 60 A in 
the restframe to avoid the O vi-Lyman-j6 and Lyman-a emission 
lines. 



The mean flux in the Lyman-ff forest is then computed using 
three different continua: two PCA continua obtained from S05 
principal components and the principal components derived in 
this work, and a power-law continuum. Fig. [16] shows the red- 
shift evolution of this quantity in redshift bins of size Az = 0.1. 
Error bars are computed from a bootstrap resampling. There is 
apparently no difference in the results when using the two PCA 
decompositions derived at low (z ~ I) and high (z ~ 3) redshifts. 
The mean flux derived from the power law continuum is system- 



atically smaller at z < 3. This is expected as, the power-law tends 
to overestimate the continuum in the forest (see Section l43T l. 

It is apparent that there is a change in the evolution of the 
mean flux at z ~ 3 with a steepening of the evolution at large 
redshift. 

To estimate what kind of feature we are able to recover with 
our procedure, we construct in the following DR7-like samples 
of simulated intermediate resolution quasar spectra probing an 
IGM with a mean flux evolution as seen in the high resolution 
data and test if we can recover this evolution by applying our 
procedures. 

4.3. Detectability of the bump with DRY 

In order to test the detectability of a feature equivalent to the 
bump seen at z ~ 3.2 by Fauc her-Gigu ere et al. (2008), we per- 
formed simulations of this effect and its measurement with mock 
spectra. 



4.3.1. Mock spectra 

We want to simulate the Lyman-a forest that will give a mean 
flux redshift evolution similar to what is seen in high resolution 
data. This evolution was fitted bv Faucher-Giguere et al.. (.20081) 



F(z) 



[(n-z)-or 



(12) 



where A = 0.00153, B = 4.060, C = -0.0969, D = 4.267 and 
E = 0.0769. 
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Fig. 16. Redshift evolution of the mean flux from SDSS DR7 
quasar spectra. Three different estimates of the continuum are 
assumed (Section l2.1b : (i) an extrapolation of a power-law fit 
(see Section IZSl red triangles); or a prediction (Eq. 2} using the 
output of a Principal Components Analysis of (ii) SDSS spectra 
at z ~ 3 as described in Section [3] (blue squares) or of (iii) HST 
spectra at z < 1 (SOS; orange diamonds). Error bars are com- 
puted from bootstrapping and are at the 3cr level. A change in 
the evolution can be noticed at z ~ 3 in the sense of a steeper 
slope at high redshift. A featureless evolution (fitted from z < 3 
points using z ~ 3 PCA measurement) is shown for guidance. 



We assume that the Lyman-a forest is made up of absorption 
lines with a column density distribution 



NHI 



HI' 



(13) 



with a - 4.9 x 10^ and = 1.46 over the column density range 
10'^ - 10'^ cm"^ and a Doppler parameter distribution given by: 



dn 
db 



(14) 



with K = 6.82 and ba- = 24.09 kms ' (iKim et alj l200lh . 
Therefore the evolution in F(z) is supposed to be due to an 
evolution in the number of clouds per unit redshift. This is 
probably oversimplistic because, if any, the feature at z ~ 3.2 is 
claimed to be possibly due to an ionization process but this is 
probably fine for what we want to estimate. 

We first construct the relation which gives at a given redshift 
the mean flux versus the number of cloud present in a redshift 
bin of Az = 0.1. This relation is parametrized by: 



(15) 



where nabs is the number of clouds drawn at random from the 
above population of clouds and and are determined by the 
simulation and depends on redshift. 

Combining Eqs.[T2]and[T5] we then derive at each redshift 
the actual number of clouds that is needed to reproduce the re- 



lation given bv lFaucher-Giguere et al.l (l2008h . We finally fit the 
redshift evolution of the number of clouds by a similar function: 



— ^noil+zY + Ce 2e2 . 
dz 

Our best fit gives the values no 
y = 3.076 + 0.006, C = -121.3 + 4.5, 
E = -0.081 +0.003 . 



D 



(16) 

8.281 + 0.080 and 
= 4.267 + 0.003 and 



Once the number of clouds per unit redshift is correctly cal- 
ibrated, we generate 50,000 mock spectra with a uniform emis- 
sion redshift distribution in the range 2.3-4.5. For a given emis- 
sion redshift, the number of absorption lines is computed from 
the line number density and is modulated to introduce Poisson 
noise. For each absorption line, the column density A^hi and the 
Doppler parameter are randomly chosen following Eq. [13] and 
Eq. [14] The spectrum is then degraded at the SDSS resolution 
{R ~ 1800) and a PCA continuum is added using 10 prin- 
cipal components and choosing the weights at random within 
the calculated distributions. The wavelength scale is binned as 
for SDSS spectra and noise is added following the SDSS g- 
magnitude distribution. 

To check the validity of our procedures, the mean flux evo- 
lution is computed from spectra with no noise and no continuum 
added (see Fig. [TTT l. The mean flux evolution recovered by our 
procedure (grey diamonds) is in excellent agreement with the 
theoritical input (bl ack dashed line) assumed to f ollow the evo- 
lution as derived bv lFaucher-Giguere et al.l (l2008h . 

4.3.2. Should we detect any feature with DR7? 

We compute 100 mock samples with the same number of quasars 
as the SDSS-DR7 and the same distributions of emission redshift 
and signal-to-noise ratio. For each sample, we compute the mean 
flux evolution fitting quasar continuum with a power-law as per- 
formed on real data. At each redshift, we compute the scatter in 
the mean flux derived from these samples. The recovered evo- 
lution is shown as black points in Fig. [17] to be compared with 
the dashed line showing the input assumed for the simulation. 
As already mentioned, it is apparent that the power-law fit of 
the QSO continuum underestimates the mean flux. However, the 
bump at z ~ 3.2, introduced in the input, is recovered although 
slightly smoothed out by the procedure. The eiTors derived from 
the simulations are shown as a grey area. They have to be com- 
pared with errors expected from the data. The latter are estimated 
using the errors obtained in SDSS-DR7 from B03-like samples 
(as in Fig. [TsT i but scaled by the square root of the ratio of the 
number of quasars in B03 and SDSS-DR7 samples. These eiTors 
are shown in Fig.[T7]by vertical eiTorbars. They are as expected 
larger than the errors from the simulations. Mock spectra are in- 
deed idealized and additional sources of uncertainty are present 
in the calibration of the data and the consequences on the con- 
tinuum fit of the somewhat odd shape of some quasars. 

4.3.3. Feature at z ~ 3.2 

We summarize in Fig.[T8]the mean flux evolution measured from 
the SDSS-DR7 data for a continuum estimated with a PCA (blue 
squares) or a power law (red triangles) coiTected for the system- 
atic bias as seen in Fig. [T7] Indeed, the power-law continuum 
systematically overestimates the amount of absorption in the 
Lyman-Qf forest. The mean flux evolution derived with a power- 
law continuum is corrected by the expected difference seen in 
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Fig. 17. Redshift evolution of the mean flux as measured from 
power-law continuum fitting of mock quasar spectra (black 
points and grey area). The evolution of the mean flux ass umed as 
an in put of the simulation is taken from Faucher-Gigue reet al.l 
(1200 8) (black dashed line). As already mentioned, the mean flux 
is underestimated by the power-law procedure but the shape of 
the evolution is recovered. Vertical grey bars are the errors ex- 
pected from the data. They are computed from the errrors de- 
rived in the B03 sample (see Fig.fTSll, scaled with respect to the 
different number of spectra in the SDSS-DR7 and B03 samples. 
They are, as expected, larger than the errors derived from the 
simulations (grey area). 



mocks between the measured mean flux and the input of the sim- 
ulation. 

There is an excellent agreement between the results of the 
two methods. Overplotted as black circles on Fig. [T8]are the re- 
sults by Faucher-Giguere_eLal, (2008) in very good agreement 
with the PCA estimate except, may be, in the bin around z ~ 3.2. 
Note that the discrepancy is less than 2cr. In any case, it is appar- 
ent that the smooth redshift evolution of the mean flux becomes 
steeper around redshift z ~ 3. 

The slight diff'erence between high and intermediate resolu- 
tion data at z ~ 3.2 ma y be explained by a diff'erent selection 
of the quasars. Indeed, IWorseck & Prochaskal (1201 1) have ar- 
gued that SDSS preferentially selects 3<Zem<3.5 quasars with 
intervening H i Lyman limit systems. However, this should have 
little influence on the mean flux in the overwhole forest and can- 
not explain the discrepancy by itself. This could also be due to 
the possibility that our procedure smoothes out a sharp feature. 
However, this would be surprising given the width of the bins 
and the results of our simulations (see Fig.fTTll. 



5. Conclusion 

The first goal of this paper is to provide a new PCA decompo- 
sition of the continuum of z ~ 3 quasars. This should be useful 
for studies of the Lyman-ff forest and to generate mock quasar 
continua to be implemented in simulations constructed to search 
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Fig. 18. Mean flux redshift evolution inferred from SDSS-DR7 
quasars (red triangles: corrected power-law continuum, and blue 
squares: z ~ 3 PCA co ntinuum) compared to the evolution from 
iFaucher-Giguere et al.l (2008) (black open circles). The mean 
flux evolution derived with a power-law continuum is corrected 
from the bias predicted in simulation (see Fig. 17). All the mea- 
surements are in agreement with each other within errors. No 
"bump" is seen in any of the evolution inferred from SDSS-DR7 
but there is a definite change in the slope of the evolution at z ~ 3 
in the sense of a steeper evolution beyond this redshift. 



for systematic eff'ects in future analysis or surveys. We took the 
opportunity to enlarge to 1020 - 2000 A the wavelength range 
over which the spectra are decomposed, to be compared with 
the previous wavelength range 1020 - 1600 A used by 805. The 
mean spectrum at z ~ 3 has a similar shape as the mean spectrum 
derived at z ~ 1 by SOS except that the strength of the Lyman-a 
and C IV emission lines relative to the continuum is smaller at 
high redshift. Our work concentrates on the estimate of the con- 
tinuum in the Lyman-a forest and we provide all outputs of this 
analysis that are required to generate mock continua. 

We use this decomposition to revisit the evolution with red- 
shift of the mean flux in the Lyman-a forest and to compare two 
methods to estimate the quasar flux in the Lyman-a forest: the 
extrapolation of a power-law fitted to the red part of the spec- 
trum and an estimate of the flux from PCA coefficients. 
We find that: 

(i) the power law method systematically underestimates the 
mean flux by an amount decreasing with redshift. When coiTect- 
ing for this bias, as estimated with simulations, we find that the 
method gives similar results as the PCA method; 

(ii) the PCA method yields results very similar to what is mea- 
sured by^emardietaL_(2003^ from high spectral resolution 
data (iFaucher-Giguere et al.ll2008l) : 

(iii) from our simulations a bump at z ~ 3.2, if present, should 
be marginally detected with the data set of SDSS-DR7; 

(iv) finally, from our analysis, we find that there is a definite 
break in the evolution of the mean flux at z ~ 3 in the sense of 
a steeper decrease of the mean flux at high redshift. We caution 
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that this could be a consequence of a more prominent bump to 
be slightly smoothed out by our procedures. 

The increase of the statistics but most importantly of the 
quality of the data that will be soon provided by the BOSS sur- 
vey should definitely settle this issue. 
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